Longer Features: They do a speech detector good
نویسندگان
چکیده
We have incorporated spectrotemporal features in a speech activity detection (SAD) task for the Speech in Noisy Environments 2 (SPINE2) data set. The features were generated by applying 2D Gabor filters to the mel spectrogram in order to measure the strength of various spectral and temporal modulation frequencies in different patches of the spectrogram. Using several different back-ends, the Gabor features significantly outperformed MFCCs, yielding relative reductions in equal error rate (EER) of between 40 and 50%. Compared to the other backends, Adaboost with tree stumps performed particularly well with Gabor features and particularly poorly with MFCCs. An investigation into the reasons for this disparity suggests that the most useful features for SAD incorporate information over longer time scales.
منابع مشابه
Design and realisation of an audiovisual speech activity detector
For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will give false positi-ves when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach to and implementation of a decision-based audiovisual speech detector is given. Acoustic and v...
متن کاملA New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملNon-Speech Segment Rejection Based on Chaotic and Prosodic Features
Speech recognition systems, to have an acceptable performance in noisy environments, could be provided with a preprocessing unit, a speech/non-speech detector. So on, the detected speech segments will be delivered to the speech recognition core. In this work, we introduce a modified speech/non-speech detection system based on prosodic features characterizing the smoothness of the peak index ser...
متن کاملPhoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملThe effect of redesign workstation on Speech Interference Level (SIL) among bank tellers
Abstract Background: There is always an interaction between man and his environment that can be the cause of physical, physiological and psychological stress on people and also cause discomfort, annoyance, and have direct and indirect effects on their performance and productivity, health and safety. People in their workplace are exposed to many factors related to work activities and environmen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012